Parsing TCT with a Coarse-to-fine Approach

نویسندگان

  • Dongchen Li
  • Xihong Wu
چکیده

A key observation is that concept compound constituent labels are detrimental to parsing performance. We use a PCFG parsing algorithm that uses a multilevel coarse-to-fine scheme. Our approach requires a sequence of nested partitions or equivalence classes of the PCFG nonterminals, where the nonterminals of each PCFG are clusters of nonterminals of the finer PCFG. We use the results of parsing at a coarser level to prune the next finer level. The coarse-to-fine method use hierarchical projections for incremental pruning. We present experiments which show that parsing with hierarchical state-splitting is fast and accurate on Tsinghua Chinese Treebank. In addition, we propose a multiple-model method that adds concept compound labels to the output of the simple PCFG model and gains higher bracketing recall from the simple model. This scheme can be implemented by training two models on different labeling styles.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An improved joint model: POS tagging and dependency parsing

Dependency parsing is a way of syntactic parsing and a natural language that automatically analyzes the dependency structure of sentences, and the input for each sentence creates a dependency graph. Part-Of-Speech (POS) tagging is a prerequisite for dependency parsing. Generally, dependency parsers do the POS tagging task along with dependency parsing in a pipeline mode. Unfortunately, in pipel...

متن کامل

Coarse-To-Fine Parsing for Expressive Grammar Formalisms

We generalize coarse-to-fine parsing to grammar formalisms that are more expressive than PCFGs and/or describe languages of trees or graphs. We evaluate our algorithm on PCFG, PTAG, and graph parsing. While we achieve the expected performance gains on PCFGs, coarse-tofine does not help for PTAG and can even slow down parsing for graphs. We discuss the implications of this finding.

متن کامل

Unsupervised Learning for Natural Language Processing

Given the abundance of text data, unsupervised approaches are very appealing for natural language processing. We present three latent variable systems which achieve state-of-the-art results in domains previously dominated by fully supervised systems. For syntactic parsing, we describe a grammar induction technique which begins with coarse syntactic structures and iteratively refines them in an ...

متن کامل

Multilevel Coarse-to-Fine PCFG Parsing

We present a PCFG parsing algorithm that uses a multilevel coarse-to-fine (mlctf) scheme to improve the efficiency of search for the best parse. Our approach requires the user to specify a sequence of nested partitions or equivalence classes of the PCFG nonterminals. We define a sequence of PCFGs corresponding to each partition, where the nonterminals of each PCFG are clusters of nonterminals o...

متن کامل

Efficient Parsing for Transducer Grammars

The tree-transducer grammars that arise in current syntactic machine translation systems are large, flat, and highly lexicalized. We address the problem of parsing efficiently with such grammars in three ways. First, we present a pair of grammar transformations that admit an efficient cubic-time CKY-style parsing algorithm despite leaving most of the grammar in n-ary form. Second, we show how t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012